Search CORE

20 research outputs found

Neural approaches to spoken content embedding

Author: Settle Shane
Publication venue
Publication date: 28/08/2023
Field of study

Comparing spoken segments is a central operation to speech processing. Traditional approaches in this area have favored frame-level dynamic programming algorithms, such as dynamic time warping, because they require no supervision, but they are limited in performance and efficiency. As an alternative, acoustic word embeddings -- fixed-dimensional vector representations of variable-length spoken word segments -- have begun to be considered for such tasks as well. However, the current space of such discriminative embedding models, training approaches, and their application to real-world downstream tasks is limited. We start by considering ``single-view" training losses where the goal is to learn an acoustic word embedding model that separates same-word and different-word spoken segment pairs. Then, we consider ``multi-view" contrastive losses. In this setting, acoustic word embeddings are learned jointly with embeddings of character sequences to generate acoustically grounded embeddings of written words, or acoustically grounded word embeddings. In this thesis, we contribute new discriminative acoustic word embedding (AWE) and acoustically grounded word embedding (AGWE) approaches based on recurrent neural networks (RNNs). We improve model training in terms of both efficiency and performance. We take these developments beyond English to several low-resource languages and show that multilingual training improves performance when labeled data is limited. We apply our embedding models, both monolingual and multilingual, to the downstream tasks of query-by-example speech search and automatic speech recognition. Finally, we show how our embedding approaches compare with and complement more recent self-supervised speech models.Comment: PhD thesi

arXiv.org e-Print Archive

Visually grounded learning of keyword prediction from untranscribed speech

Author: Kamper Herman
Livescu Karen
Settle Shane
Shakhnarovich Gregory
Publication venue
Publication date: 25/05/2017
Field of study

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech. Concretely, we use an image-to-words multi-label visual classifier to tag images with soft textual labels, and then train a neural network to map from the speech to these soft targets. We show that the resulting speech system is able to predict which words occur in an utterance---acting as a spoken bag-of-words classifier---without seeing any parallel speech and text. We find that the model often confuses semantically related words, e.g. "man" and "person", making it even more effective as a semantic keyword spotter.Comment: 5 pages, 3 figures, 5 tables; small updates, added link to code; accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref

What Do Self-Supervised Speech Models Know About Words?

Author: Chien Chung-Ming
Livescu Karen
Pasad Ankita
Settle Shane
Publication venue
Publication date: 31/01/2024
Field of study

Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic and speaker information, but we still lack a proper understanding of knowledge encoded at the word level and beyond. In this work, we use lightweight analysis methods to study segment-level linguistic properties -- word identity, boundaries, pronunciation, syntactic features, and semantic features -- encoded in S3Ms. We present a comparative study of layer-wise representations from ten S3Ms and find that (i) the frame-level representations within each word segment are not all equally informative, and (ii) the pre-training objective and model size heavily influence the accessibility and distribution of linguistic information across layers. We also find that on several tasks -- word discrimination, word segmentation, and semantic sentence similarity -- S3Ms trained with visual grounding outperform their speech-only counterparts. Finally, our task-based analyses demonstrate improved performance on word segmentation and acoustic word discrimination while using simpler methods than prior work.Comment: Pre-MIT Press publication versio

arXiv.org e-Print Archive

Exposure to the BPA-Substitute Bisphenol S Causes Unique Alterations of Germline Function

Author: Donatello Telesca (736768)
Dong Yeon Lee (691223)
Le Shu (3144336)
Patrick Allard (1990945)
Sara J. Settle (3144339)
Shane Que Hee (3144342)
Xia Yang (46748)
Yichang Chen (3144345)
Zhiqun Qiu (304215)
Publication venue
Publication date: 01/07/2016
Field of study

<div><p>Concerns about the safety of Bisphenol A, a chemical found in plastics, receipts, food packaging and more, have led to its replacement with substitutes now found in a multitude of consumer products. However, several popular BPA-free alternatives, such as Bisphenol S, share a high degree of structural similarity with BPA, suggesting that these substitutes may disrupt similar developmental and reproductive pathways. We compared the effects of BPA and BPS on germline and reproductive functions using the genetic model system <i>Caenorhabditis elegans</i>. We found that, similarly to BPA, BPS caused severe reproductive defects including germline apoptosis and embryonic lethality. However, meiotic recombination, targeted gene expression, whole transcriptome and ontology analyses as well as ToxCast data mining all indicate that these effects are partly achieved via mechanisms distinct from BPAs. These findings therefore raise new concerns about the safety of BPA alternatives and the risk associated with human exposure to mixtures.</p></div

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Bisphenols exposure induces DNA damage checkpoint kinase CHK-1 activation.

Author: Donatello Telesca (736768)
Dong Yeon Lee (691223)
Le Shu (3144336)
Patrick Allard (1990945)
Sara J. Settle (3144339)
Shane Que Hee (3144342)
Xia Yang (46748)
Yichang Chen (3144345)
Zhiqun Qiu (304215)
Publication venue
Publication date
Field of study

<p>(A) Immunostaining of phosphorylated CHK-1 on mid- to late-pachytene nuclei from dissected gonads of worms exposed to vehicle control (0.1% ethanol), 500 μM BPA,500 μM BPS or to their mixture (Scale bar, 10 μm). (B) Percentage of examined worms with elevated pCHK-1 in each group. Error bars represent SEM. N = 10 worms per trial, three repeats per treatment group. All tests are based on t statistics. **P<0.01.</p

FigShare